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ABSTRACT 



A decoder is used in an end-to-end scalable video delivery 
system operable over heterogeneous networks. The decoder 
may software-based and computationally low complexity, or 
may be implemented inexpensively in ROM hardware. The 
system utilizes a scalable - video compression algorithm 
based on a Laplacian pyramid decomposition to generate an 
embedded information stream. At the receiving end, the 
decoder extracts from the embedded stream different 
streams at different spatial and temporal resolutions. Decod- 
ing a 160x120 pixel image involves only decompressing a 
base layer 160x120 pixel image. Decoding a 320x240 pixel 
image involves decompressing and up -sampling (e.g., 
interpolatiDg) the base layer to y^1HjiJt?fW?,dn piyf) im^ge 
to which is added error data in a fi rst enhancement layer 
following its decompression. TtTobtain a 640x480 pixel 
image, the decoder up-samples the up -sampled 320x240 
pixel image, to which is added error data in a second 
enhancement layer, following its decompression. Because 
decoding requires only additions and look-ups from a table 
stored in a small (12 Kb) memory, decoding occurs in 
real-time. Subjective quality of the compressed images 
preferably is enhanced using perceptual distortion measures. 
The system also provides joint-source channel coding capa- 
bility on heterogenous networks. The look-up table or code- 
book includes the inverse perceptual weighting 
(preprocessed) and the inverse transform (preprocessed). 
Decoding permits the codewords within the look-up table 
codebook to include preprocessed color conversion, 
dithering, color palletization, edge-enhancement, 
decimation, and interpolation. 

20 Claims, 3 Drawing Sheets 
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DECODER FOR A SOFTWARE- too long to complete operations such as color palletization to 

IMPLEMENTED END-TO-END SCALABLE operate in real-time. 

VIDEO DELIVERY SYSTEM Unfortunately, fixed bandwidth prior art systems cannot 

make full use of such dynamic environments and system 
This application is a continuation of Sen No. 08/424,703 5 variations. The result is slower throughput and more severe 
field Apr. 18, 1995, U.S. Pat. No. 5,742,892. contention for a given level of expenditure for system 

hardware and software. When congestion (e.g., a region of 
FIELD OF THE INVENTION constrained bandwidth) is present on the network, packets of 

m . „ transmitted information will be randomly dropped, with the 

The present invention relates generally to client decoders result that no uscM informatiotl may bc rcceivcd by tnc 
for use with video delivery systems, and more specifically to 1U c ji en ^ 

client decoders for use in video delivery systems in which video ^f^^ fa extremely storage intensive, and 
video may be delivered scalably, so as to maximize use of compression is necessary during storage and transmission, 
network resources and to minimize user-contention con- Although scalable compression would be beneficial, espe- 
flicts. 35 cially for browsing in multimedia video sources, existing 

BACKGROUND OF THE INVENTION compression systems do not provide desired properties for 

scalable compression. By scalable compression it is meant 

It is known in the art to use server-client networks to that a full dynamic range of spatial and temporal resolutions 
provide video to end users, wherein the server issues a should be provided on a single embedded video stream that 
separate video stream for each individual client. ^ is output by the server over the network(s). Acceptable 

Alibrary of video sources is maintained at the server end. software-based scalable techniques are not found in the prior 
Chosen video selections are signal processed by a server art. For example, the MPEG-2 compression standard offers 
encoder stored on digital media, and are then transmitted limited extent scalability, but lacks sufficient dynamic range 
over a variety of networks, perhaps on an basis that allows of bandwidth, is costly to implement in software, and uses 
a remote viewer to interact with the video. The video may be ^ variable length codes that require additional error correction 
stored on media that includes magnetic disk, CD-ROM, and support. 

the stored information can include video, speech, and Further, prior art compression standards typically require 
images. As such, the source video information may have dedicated hardware at the encoding end, e.g., an MPEG 
been stored in one of several spatial resolutions (e.g., board for the MPEG compression standard. While some 
160x120, 320x240, 640x480 pixels), and temporal resolu- 3Q prior art encoding techniques are software-based and operate 
lions (e.g., 1 to 30 frames per second). The source video may without dedicated hardware (other than a fast central pro- 
present bandwidths whose dynamic range can vary from 10 cessing unit), known software-based approaches are too 
Kbps to 10 Mbps. The sigual processed video is transmitted computational intensive to operate in real-time. For 
to the clients (or decoders) over one or more delivery example, JPEG software running on a Sparcstation 10 
networks that may be heterogeneous, e.g., have widely 35 workstation can handle only 2-3 frames/second, e.g., about 
differing bandwidths. For example, telephone delivery lines 1% of the frame/second capability of the present invention, 
can transmit at only a few tens of Kbps, an ISDN network Considerable video server research in the prior art has 
can handle 128 Kbps, ethernet at 10 Mbps, whereas ATM focussed on scheduling policies for on-demand situations, 
networks handle even higher transmission rates. admission control, and RAID issues. Prior art encoder 

Although the source video has varying characteristics, 40 operation typically is dependent upon the characteristics of 
prior art video delivery systems operate with a system the client decoders. Simply stated, relatively little work has 
bandwidth that is static or fixed. Although such system been directed to video server systems operable over hetero- 
bandwidths are fixed, in practice, the general purpose com- geneous networks having differing bandwidth capabilities, 
puting environment associated with the systems are where host decoders have various spatial and temporal 
dynamic, and variations in the networks can also exist. 45 resolutions. 

These variations can arise from the outright lack of i n summary, there is a need for a video delivery system 
resources (e.g., limited network bandwidth and processor that provides end-to-end video encoding such that the server 
cycles), contention for available resources due to outputs a single embedded data stream from which decoders 
congestion, or a user's unwillingness to allocate needed ma y extract video having different spatial resolutions, tern- 
resources to the task. 50 poral resolutions and data rates. The resultant video corn- 
Prior art systems tend to be very computationally pression would be bandwidth scalable and thus deliverable 
intensive, especially with respect to decoding images of over heterogeneous networks whose transmission rates vary 
differing resolutions. An encoder may transmit a bit stream from perhaps 10 Kbps to 10 Mbps. Such a system should 
of, say, 320x240 pixel resolution, whereas the decoder accommodate lower bandwidth links or congestion, and 
requires 160x120 pixel resolution. Decoding, in the prior art, 55 should permit the encoder to operate independently of 
requires that several processes be invoked, including decoder capability or requirements, 
decompression, entropy coding, quantization, discrete Preferably the decoder should be software-based (e.g., not 
cosine transformation and down -sampling. Collectively, require specialized dedicated hardware beyond a computing 
these process steps take too long to be accomplished in system) and should permit real-time decompression, 
real-time. so Alternatively, the decoder should be implementable in 
Color conversions, e.g., YUV-to-RGB are especially hardware, using relatively inexpensive components. The 
computationally intensive, in the prior art. In another system should permit user selection of a delivery bandwidth 
situation, an encoder may transmit 24 bits, representing 16 to choose the most appropriate point in spatial resolution, 
million colors, but a recipient decoder may be coupled to a temporal resolution, data-rate and in quality space. The 
PC having an 8 bit display, capable of only 256 colors. The 65 system should also provide subjective video quality 
decoder must then dither the incoming data, which is a enhancement, and should include error resilience to allow 
computationally intensive task. Prior art decoders also take for communication errors. 
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The present invention provides such a decoder an end- 
to-end scalable video delivery system. 

SUMMARY OF THE INVENTION 

In a first embodiment, the present invention provides a 5 
software-based, rapidly operating decoder of low computa- 
tional complexity for use with an end-to-end scalable video 
delivery system whose software -based server- encoder oper- 
ates independently of the capabilities and requirements of 
the software-based decoders). The decoder operates in 10 
conjunction with a central processor unit ("CPU") and relies 
upon stored look-up tables containing preprocessed opera- 
tions including color conversion, dithering, color 
palletization, interpolation, decimation, edge enhancement, 
and the like. In a second embodiment, the decoder functions 15 
are permanently stored in a read-only memory ("ROM") that 
operates in conjunction with a relatively simple central 
processor unit. 

The encoder uses a scalable compression algorithm based 2Q 
upon Laplacian pyramid decomposition. An original 640x 
480 pixel image is decimated to produce a 320x240 pixel 
image that is itself decimated to yield a 160x120 pixel base 
image that is encoder-transmitted. This base image is then 
compressed to form a 160x120 pixel base layer that is 
decompressed and up-sampled to produce an up-sampled 
320x240 pixel image. The up -sampled 320x240 pixel image 
is then subtracted from the 320x240 pixel image to provide 
an error image that is compressed as transmitted as a first 
enhancement layer. The 1 60x1 20 pixel decompressed image 3Q 
is also up-sampled to produce an up-sampled 640x480 pixel 
image that is subtracted from the original 640x480 pixel 
image to yield an error image that is compressed and 
transmitted as a second enhancement layer. 

Collectively the base layer, and first and second enhance- 35 
ment layers comprise the single embedded bitstream that 
may be multicast over heterogeneous networks that can 
range from telephone lines to wireless transmission. Packets 
within the embedded bit-stream preferably are prioritized 
with bits arranged in order of visual importance. The result- ^ 
ant bit stream is easily rescaled by dropping less important 
bits, thus providing bandwidth scalability dynamic range 
from a few Kbps to many Mbps. Further, such embedded bit 
stream permits the server system to accommodate a plurality 
of users whose decoder systems have differing characterise 4 $ 
tics. The transmitting end also includes a market-based 
mechanism for resolving conflicts in providing an end-to- 
end scalable video delivery service to the user. 

At the receiving end, the present invention comprises 
decoders, software-based or contained in ROM, of varying 50 
characteristics that extract different streams at different 
spatial and temporal resolutions from the single embedded 
bit stream. Decoding a 160x120 pixel image involves only 
decompressing the base layer 160x120 pixel image. Decod- 
ing a 320x240 pixel image involves decompressing and 55 
up-sampling (e.g., interpolating) the base layer to yield a 
320x240 pixel image to which is added error data in the first 
enhancement layer following its decompression. To obtain a 
640x480 pixel image, the decoder up-samples the 
up -sampled 320x240 pixel image, to which is added error 6 o 
data in the second enhancement layer, following its decom- 
pression. 

Because decoding requires only additions and look-ups 
from a table stored in a small (12 Kb) memory, decoding 
occurs in real-time. Further, the decoder functions may be 65 
stored in a 12 Kb ROM that operates under control of a 
simple CPU. Subjective quality of the compressed images 
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preferably is enhanced using perceptual distortion measures. 
The system also provides joint-source channel coding capa- 
bility on heterogenous networks. The look-up table or code- 
book includes the inverse perceptual weighting 
(preprocessed) and the inverse transform (preprocessed). 
This permits decoding to involve merely a look-up operation 
and addition. Decoding according to the present invention 
permits the codewords within the look-up table or codebook 
to include preprocessed color conversion, dithering, color 
palletization, edge-enhancement, decimation, and interpola- 
tion. 

Other features and advantages of the invention will appear 
from the following description in which the preferred 
embodiments have been set forth in detail, in conjunction 
with the accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of an end-to-end scalable video 
system, accord to the present invention; 

FIG. 2 is a block/flow diagram depicting a software-based 
encoding technique for generating a scalable embedded 
video stream, according to the present invention; 

FIG. 3 is a block/flow diagram depicting a decoding 
technique to recover scalable video from a single embedded 
video stream, according to the present invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

FIG. 1 depicts an end-to-end scalable video delivery 
system including decoders) 40, 40' according to the present 
invention. Decoders 40, 40* may be software-based low 
computational complexity units, or may be implemented in 
hardware. A source of audio and video information 10 is 
coupled to a server or encoder 20. The encoder signal 
processes the information to produce a single embedded 
information stream that is transmitted via homogeneous 
networks 30, 30* to one or more target clients or software- 
based decoder systems 40, 40', which use minimal central 
processor unit resources. Network transmission may be 
through a so-called network cloud 50, from which the single 
embedded information stream is multicast to the decoders, 
or transmission to the decoders 40' may be point-to-point. 

The networks are heterogeneous in that they have widely 
varying bandwidth characteristics, ranging from as low as 
perhaps 10 Kbps for telephones, to 100 Mbps or more for 
ATM networks. As will be described, the single embedded 
information stream is readily scaled, as needed, to accom- 
modate a lower bandwidth network link or to adapt to 
network congestion. 

Server 20 includes a central processor unit ("CPU") with 
associated memory, coEectively 55, scalable video encoder 
60, according to the present invention, a mechanism 70 for 
synchronizing audio, video and textual information, a 
mechanism 80 for arranging the information processed by 
the scalable video encoder onto video disks 90 (or other 
storage media). Storage 100 is also provided for signal 
processed audio information. Software comprising the scal- 
able video encoder 60 preferably is digitally stored within 
server 20, for example, within the memory associated with 
CPU unit 55. 

An admission control mechanism 110 is coupled to the 
processed video storage 90, as is a communication error 
recovery mechanism 120 for handling bit errors or packet 
cell loss. The decoder algorithm provides error resilience 
allowing for such communication errors. The server com- 
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tnunicates with the heterogeneous network(s) through a Thus, the present invention uses vector quantization 
network interface 130. across transform bands to embed coding to provide band- 
Scalable video encoder 60 preferably is implemented in width scalability with an embedded bit stream. Vector quan- 
software only (e.g., no dedicated hardware), and generates a tization techniques are known in the art. See, for example, 
single embedded information stream. Encoder 60 employs a 5 A - Gerso a nd R * M - Grav > "Vector Quantization and Signal 
new video coding algorithm based on a Laplacian pyramidal Compression", Kluwer Academic Press, 1992. 
decomposition to generate the embedded information Embedded coding and vector quantization may each be 
stream. (Laplacian pyramids are a form of compression performed by tree-structured vector quantization methods 
known to those skilled in the art, and for that reason further e.g., by a successive approximation version of vector quan- 
details are not presented here.) The generated embedded 1Q tization ("VQ"). In ordinary VQ, the codewords lie in an 
stream allows server 20 to host decoders 40, 40' having unstructured codebook, and each input vector is mapped to 
various spatial and temporal resolutions, without the server the minimum distortion codeword. * Thus, VQ induces a 
having to know the characteristics of the recipient decoder partition of a input space into Voronoi encoding regions, 
(s). By contrast, when using TSVQ, the codewords are 
Encoding is shown in FIG. 2. In overview, the base layer arranged in a tree structure, and each input vector is suc- 
is coding by doing a discrete cosine transform ("DCT") 15 cessively mapped (from the root node) to the minimum 
followed by perceptual weighting, followed by vector quan- distortion child node. As such, TSVQ induces a hierarchical 
tization. The base layer is coded and is interpolated to the partition, or refinement of the input space as three depth of 
next higher layer, where the error difference is taken with the tree increases. Because of this successive refinement, an 
respect to the original image at that layer. The error is again input vector mapping to a leaf node can be represented with 
coded using DCT followed by perceptual weighting, fol- 20 high precision by the path map from the root to the leaf, or 
lowed by vector quantization. The process is repeated for the with lower precision by any prefix of the path, 
next higher level image TSVQ produces an embedded encoding of the data 
More specifically, as shown in FIG. 2, an original 640x wherein if the depth of the tree is R and the vector dimension 

480 pixel image 200 from source 10 is coupled to the ? blt rates 0/k ' ■ ' > R/k . c * n all 1 be achieve K d ' To a f ie ™ 

, I « 0 , £t\ a * * ->in *u- •„ . 25 further compression, the index-planes can be run-length 

scalable video encoder 60. At process step 210, this image j n ju * j ai vu c a ■ * 

• j . 4. a / an a a u 1 a\ I -ma coded followed by entropy coding. Algorithms for desigmng 

is decimated (e.g filtered and sub-sampled) to 320x240 JS yQs and i{& extensively TTie 

pixels (image 220), step 210, and at process step 230 image GefS0 and G treatise dted above ides a background 

220 is decimated to produce a base layer 160x120 pixel survey of such ^^5. 

image 240 for encoding by encoder 60. 3Q In the prior ^ mean squared error typically ^ as 

For the 160x120 pixel base layer, encoding rnr frrihly itr distortion measure, with discrete cosine transforms ("DCT 1 ) 

done-cm a 1x2 blocks (e.g., two adjacent pixels on one line, being followed by scalar quantization. By contrast, the 

and two adjacent pixels on a next line defining the block) present invention performs DCT after which whole blocks 

with DCT followed by tree-structured vector quantization of data are subjected to vector quantization, preferably with 

("TSVQ") of the results of that transform. For the 320x240 35 a perception model. Subjectively meaningful distortion mea- 

first enhancement layer, encoding is done on 4x4 blocks, sures are used in the design and operation of the TSVQ. For 

with DCT followed by TSVQ, and for the 640x480 pixel this purpose, vector transformation is made using the DCT. 

enhancement layer, encoding is done on 8x8 blocks, again Next, the following input-weighted squared error is applied 

with DCT followed by TSVQ. to the transform coefficients: 

At step 250, the 160x120 pixel base image 240 is com- ^ 

pressed to form a 160x120 pixel base layer 260 and then at * 2 

step 270 is decompressed. The resulting decompressed dr ^' ^ = 2j Wjiyj " W 
image 280 is up-sampled by interpolation step 290 to 
produce an up-sampled 320x240 pixel image 300. 

At summation step 310, the up-sampled 320x240 pixel 45 In the above equation, y f and fj are the components of the 

image 300 is subtracted from the 320x240 pixel image 220 transformed vector y and of the corresponding reproduction 

to give an error image 320. At step 330 the error image 320 vector % whereas w y . is a component of the weight vector 

is compressed and then transmitted as a first enhancement 20 depending in general on y. Stated differently, distortion is the 

640x480 pixel layer 340. weighted sum of squared differences between the coeffi- 

The 160x120 pixel decompressed image 280 is also 50 cients of the or iginal transformed vector and the correspond- 

up-sampled at step 350 to produce an up-sampled 640x480 ing reproduced vector. 

pixel image 360. At summation step 370, the up-sampled According to the present invention, the weights reflect 
640x480 pixel image 360 is subtracted from the original human visual sensitivity to quantization errors in different 
640x480 pixel image 200 to yield an error image 380. At transform coefficients, or bands. The weights are input- 
step 390, the error image 380 is compressed to yield a 5S dependent to model masking effects. When used in the 
second enhancement 320x240 pixel layer 400 that is trans- perceptual distortion measure for vector quantization, the 
mitted. Collectively, layers 260, 340 and 400 comprise the weights control an effective stepsize, or bit allocation, for 
embedded bit-stream generated by the scalable video each band. When the transform coefficients are vector quan- 
encoder 60. tized with respect to a weighted squared error distortion 
Thus, it is appreciated from FIG. 2 that a scalable video 60 measure, the role played by weights w l7 . . . , w^ corresponds 
encoder 60 according to the present invention encodes three to stepsizes in the scalar quantization case. Thus, the per- 
image resolutions. The transmitted base layer 260 has com- ceptual model is incorporated into the VQ distortion 
pressed data for the compressed 160x120 pixel image 240. measure, rather than into a stepsize or bit allocation 
The first enhancement layer 340 has error data for the algorithm, This permits the weights to vary with the input 
compressed 320x240 pixel image 220, and the second 65 vector, while permitting the decoder to operate without 
enhancement layer 400 has error data for the compressed requiring the encoder to transmit any side information about 
640x480 pixel image 200. the weights. 
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In the first stage of the compression encoder shown in The error data is placed similarly as another data stream. 

FIG. 2, an image is transformed using DCT. The second The look-up indices preferably are stored as the most 

stage of the encoder forms a vector of the transformed block. significant two bits of the look-up indices in the first section 

Next, the DCT coefficients are vector quantized using a for each frame block in the bit stream. Then follow the 

TSVQ designed with a perceptually meaningful distortion 5 sec ond two bits of the look-up indices as the second section, 

measure. The encoder sends the indices as an embedded followed in turn by four additional 1-bit sections of lookup 

stream with different index planes. The first index plane indices that are stored to provide look-up indices with 2, 4, 

contains the index for the rate 1/k TSVQ codebook. The ^ ^ 7> 8 h{% respectivelv . other encoding bit patterns might 

second index plane contains the additional index which be ^ howevcr 

along with the first index plane gives the index for the rate _ „ * , * . „ * m 

2/k TSVQ codebook. The remaining index planes similarly 10 Preferably the video server uses RAID-like techniques to 

have part of the indices for 3/k, 4/k . . . , R/k TSVQ stripe each (data stream) across several drives. RAID design 

codebooks, respectively techniques are known in the art; e.g., see F. Tobagi, et aL, 

Such encoding of the indices advantageously produces an "Streaming RAID— A disk array management system for 

embedded prioritized bitstream. Thus, rate or bandwidth v i deo files," Proc - AC M Multimedia 1993. A RAID design 

scalability is easily achieved by dropping index planes from 15 for recovery from failure of any single disk without 

the embedded bit-stream. At the receiving end, the decoder diminishing the capacity of the server. A RAID design 

can use the remaining embedded stream to index a TSVQ removes any restriction on the number of active users of a 

codebook of the corresponding rate. gi yen video litle > as long as the multiple users can be 

Frame-rate scalability can be easily achieved by dropping accommodated within the server total bandwidth. That is, 

frames, as at present no interframe compression is imple- 20 te usage can range from all active users receiving the same 

mented in the preferred embodiment of the encoder algo- utle at different offsets to all receiving different streams, 

rithm. The algorithm further provides a perceptually priori- The streams of base and enhancement layer data prefer- 

tized bit-stream because of the embedding property of ably are striped in fixed size units across the set of drives in 

TSVQ. If desired, motion estimation and/or conditional the RAID group, with parity placed on an additional drive, 

replenishment may also be incorporated into the system. 25 The selection of the parity drive is fixed since data updates 

Scalable compression according to the present invention m rare compared to the number of times the streams 

is also important for image browsing, multimedia m read ; ^ c preferred striping policy keeps all of the 

applications, transcoding to different formats, and embedded look-up indices for an individu al frame together on one disk, 

television standards. By prioritizing packets comprising the 3Q This allows for ease of positioning when a user single steps 

embedded stream, congestion due to contention for network or fast-forwards the user's display, although there is a 

bandwidth, central processor unit ("CPU") cycles, etc., in penalty in some loss of storage capacity due to fragmenta- 

the dynamic environment of general purpose computing ^on. Use of parity on the stripe level allows for quick 

systems can be overcome by intelligently dropping less recovery after a drive failure at the cost of using substan- 

important packets from the transmitted embedded stream. 35 tially more buffer space to hold the full exclusive-OR 

Information layout on the video disk storage system 90 recovery data set. 

(see FIG. 1) preferably involves laying the video as two In the present invention, the video server utilizes the 

streams, e.g., the base layer and the first and second planar bit stream format directly as the basis for the packet 

enhancement layer streams. In practice, it is not necessary to stream in the network layer. The embedded stream bits plus 

store the error signal for the 640x480 resolution, since fairly ^ the application packet header are read from disk 90 and are 

good quality video can be provided by bilinear interpolation transmitted on the network in exactly the same format. For 

of the 320x240 resolution images. example, in the preferred embodiment the base video layer 

The base layer data is stored as a separate stream from the has me four most significant bits of the look-up indices 

enhancement layer data on disk subsystem 90. This allows st °red together. Thus, those bits are transmitted as one 2440 

the system to admit more users when fewer users choose to 45 b y te packet, and each additional index bit plane of the less 

receive the enhancement layer data. As will now be significant bits is transmitted as a separate 640 byte packet, 

described, the base layer data is stored hierarchically, data The header preferably contains a frame sequence number, 

for each frame being stored together. Each frame has a set nominal frame rate, size, a virtual time stamp, and a bit plane 

of index planes corresponding to different number of bits type specifier sufficient to make each packet an identifiable 

used for the lookup. 50 stand-alone unit. The server uses the self identifying header 

The compressed stream comprises look-up indices with t0 extract each bit plane group packet from the striped frame 
different number of bits depending on the bandwidth and data retrieved from the disk subsystem, 
quality requirement. The look-up indices for each frame are The server also uses the header sequence and rate infor- 
stored as groups of index planes pre-formatted with appli- mation as a means to pace the network transmission and disk 
cation level headers for network transmission. Preferably the 55 read requests. The server uses a feedback loop to measure 
four most significant bits of the lookup indices are stored the processing and delivery time costs of the disk reads and 
together as the first section of the frame block. Then four queueing the network packets for transmission. The server 
additional 1-bit planes of look-up are stored in sequence, as then uses these measures to schedule the next disk read and 
separate sections of the frame block to provide lookup packet transmission activities to match the video stream 
indices with 4, 5, 6, 7, 8 bits, respectively. The different eo frame rate (i.e., at X milliseconds in the future start trans- 
look-up indices provide data streams with different band- niitting the next frame of video). The server can moderate 
width requirements. the transmission rate based on slow down/speed-up feed- 

With reference to FIG. 1, server 20 fetches the base signal back from the decoder, 

frame block from the disk 90, transmits the selected sections With further reference to FIG. 1, at the receiving end, 

on the network 30, 30'. The re-packing of the bit planes into 65 decoders) 40 according to the present invention include a 

look-up indices is left to the receiving application at the cen tral pro cessing unit ("CPU") 140 that includes a CPU per 

client-end of the system. se and associated memory including cache memory. 
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Decoders) 40 further includes a mechanism 145 for syn- 
chronizing audio and video information from the incoming 
embedded stream, as well as audio and video decoders 150, 
160. The output from these decoders is coupled to sound 
generator, e.g., a speaker, and to video displays 180. 5 

If the decoders are software-based, the decoding process 
steps preferably are stored in memory, for example memory 
140, for execution by the associated CPU. Alternatively, in 
applications where full CPU operations are not required, for 
example simple display applications, decoders according to 1Q 
the present invention may be implemented in hardware, e.g., 
in a simply CPU 1 and read-only memory ("ROM") unit 155. 
Within unit 155 is a relatively simple central processor unit 
CPU' that, collectively with the associated ROM, represents 
a hardware unit that may be produced for a few dollars. 

Target decoder system 40 should be able to define at least 
160x120, 320x240, 640x480 pixel spatial resolutions, and at 
least 1 to 30 frames per second temporal resolution Decoder 
system 40 must also accommodate bandwidth scalability 
with a dynamic range of video data from 10 kbps to 10 kbps 2Q 
to 10 Mops. According to the present invention, video 
encoder 60 provides a single embedded stream from which 
different streams at different spatial and temporal resolutions 
and different data rates can be extracted by decoders 40, 
depending on decoder capabilities and requirements. ^ 
However, as noted, encoder embedding is independent of the 
characteristics of the decoder(s) that will receive the single 
embedded information stream. 

For example, decoder 40 can include search engines that 
permit a user to browse material for relevant segments, 30 
perhaps news, that the user may then select for full review. 
Within server 20, video storage 90 migrates the full 
resolution, full frame rate news stories based on their age 
and access history from disk to CD ROM to tape, leaving 
lower resolution versions behind to support the browsing 35 
operation. If a news segment becomes more popular or 
important, the higher resolution can then be retrieved and 
stored at a more accessible portion of the storage hierarchy 
90. 

The decoders merely use the indices from the embedded 40 
bit -stream to look-up from a codebook that is designed to 
make efficient use of the cache memory associated with the 
CPU unit 140. According to the present invention, video 
stream decoding is straightforward, and consists of loading 
the codebooks into the CPU cache memory, and performing 45 
look-ups from the stored codebook tables. In practice, the 
codebook may be stored in less than about 12 Kb of cache 
memory. 

As noted, alternatively, unit 155 may include a small, e.g., 
about 12 Kb ROM, that under control of a simply processor 50 
unit CPU 1 inexpensively provides decoder operations in 
applications where the full function of a more complex 
processor (such as the CPU associated with unit 140) are not 
needed. It is understood from FIG. 1 that a hardware-based 
(e.g., ROM-based) decoder embodiment would include unit 55 
155 but not unit 140. Video decoder 160 uses a Laplacian 
pyramid decoding algorithm, and preferably can support up 
to three spatial resolutions, i.e., 160x120 pixels, 320x240 
pixels, and 640x480 pixels. Further, decoder 160 can sup- 
port any frame rate, as the frames are coded independently eo 
by encoder 60. 

The decoding methodology is shown in FIG. 3. To decode 
a 160x120 pixel image, decoder 160 at method step 410 
need only decompress the base layer 160x120 pixel image 
260. The resultant image 430 is copied to video monitor (or 65 
other device) 180. APPENDIX 1, attached hereto, is a 
sample of decompression as used with the present invention. 



To obtain a 320x240 pixel image, decoder 160 first 
decompresses (step 410) the base layer 260, and then at step 
440 up-samples to yield an image 450 having the correct 
spatial resolution, e.g., 320x240 pixels. Next, at step 460, 
the error data in the first enhancement layer 340 is decom- 
pressed. The decompressed image 470 is then added at step 
480 to up-sampled base image 450. The resultant 320x240 
pixel image 490 is coupled by decoder 160 to a suitable 
display mechanism 180. 

To obtain a 640x480 pixel image, the up-sampled 320x 
240 pixel image 450 is up-sampled at step 500 to yield an 
image 510 having the correct spatial resolution, e.g., 640x 
480 pixels. Next, at step 520, the error data in the second 
enhancement layer 400 is decompressed. The decompressed 
image 530 is added at step 540 to the up-sampled base image 
510. The resultant 640x480 pixel image 550 is coupled by 
decoder 160 to a suitable display mechanism 180. 

As seen from FIG. 3 and the above-description, it will be 
appreciated that obtaining the base layer from the embedded 
bit stream requires only look-ups, whereas obtaining the 
enhancement layers involves performing look-ups of the 
base and error images, followed by an addition process. The 
decoder is software-based and operates rapidly in that all 
decoder operations are actually performed beforehand, i.e., 
by preprocessing. The TSVQ decoder codebook contains the 
inverse DCT performed on the codewords of the encoder 
codebook. 

Thus, at the decoder there is no need for performing 
inverse block transforms. Color conversion, i.e., Yuv to 
RGB, is also performed as a pre-processing step by storing 
the corresponding color converted codebook. To display 
video on a limited color palette display, the resulting code- 
words of the decoder codebook are quantized using a color 
quantization algorithm. One such algorithm has been pro- 
posed by applicant Chaddha et al, "Fast Vector Quantization 
Algorithms for Color Palette Design Based on Human 
Vision Perception," accepted for publication IEEE Transac- 
tions on Image Processing. 

According to the present invention, color conversion 
involves forming a RGB or YUV color vector from the 
codebook codewords, which are then color quantizing to the 
required alphabet size. Thus, the same embedded index 
stream can be used for displaying images on different 
alphabet decoders that have the appropriate codebooks with 
the correct alphabet size, e.g., 1-bit to 24-bit color. 

On the receiving end, the video decoder 40, 40' is respon- 
sible for reassembly of the lookup indices from the packets 
received from the network. If one of the less significant 
index bit plane packets is somehow lost, the decoder uses the 
more significant bits to construct a shorter look-up table 
index. This yields a lower quality but still recognizable 
image. 

The use of separately identified packets containing index 
bit planes makes it possible for networks to easily scale the 
video as a side effect of dropping less important packets. In 
networks providing QOS qualifiers such as ATM, multiple 
circuits can be used to indicate the order in which packets 
should be dropped (i.e., the least significant bit plane packets 
first). In an IP router environment, packet filters can be 
constructed to appropriately discard less important packets. 
For prioritized networks, the base layer will be sent on the 
high priority channel while the enhancement layer will be 
sent on the low priority channel. To provide error resiliency, 
using a fixed-rate coding scheme with some added redun- 
dancy allows robustness in the event of packet loss. 

It will be appreciated that a server according to the present 
invention can support two usage scenarios: point-to-point 
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demand (e.g., networks 30', decoders 40* in FIG. 1), oi slow-down indications back to the video server. This process 

multicast (e.g., network cloud 50, networks 30, decoders 40 synchronizes streams whose elements arrive in a timely 

in FIG. 1). fashion and does not allow a slow stream to impede the 

In a point-to-point demand environment, each destination progress of the other streams, 
system decoder presents its specific requirements to the 5 T n the even of scarcity of resources, some global priori- 
server. The server then sends the selected elements of the tization of user requests must take place to guard against 
embedded stream across the network to the destination. A Qverload coll In a pract i cal system , payment for ser- 
separate network stream per destination allows the user to ^ s afld resources be used to define me overall value 
have VCR style functionality such as play/stop/rewind fast ^ ^ ^ ^ & 
forward/fast reverse. If congestion occurs on the network, nn * * 1 j • r 4 . 4 u a L 
the routers and switches can intelligently drop packets fiom 10 total orfermg of the user requests can be made, e.g., by 
the embedded stream to give a lessor number of lookup bits. admission control 110, and the less unportant requests can 

T , , u . - , , „ be dropped. The user specifies what he or she is willing to 

In a multicast environment, the server, which has no r ■ • tu- a a *u 

information about the destination decoders, outputs the ™ *» 1 f mce ' ^ >? d *° 

entire embedded stream for the different resolutions and „ reqmred associated resourc^ (nefwork and disk bandwidth) 

, iU , , r . -r . I r , 15 are submitted to an electronic market, e.g., admission con- 
rates onto the network as a set of trees. In the preferred 4 « • ■ f \ . a -a l • 

, ,. , , , j j ■ trol 110, which uses micro -economic models to decide what 

embodiment, there may be one to eleven trees, depending on ' ; / * JiL . ,1 * ^ 0 , 

4l _ , ., c . J * i j j -m. - * amount of bandwidth resource is available to the user. Such 

the granularity of traffic control desired. The primary traffic , . , . , t u _ 4 ,. 

& , . r » j .1 * *■ * *u techniques are known in the art, e.g., M. Miller. Extending 

management is performed during the construction 01 the , M . U J ^ ^ • 

• , , . r , , , . . .. * w rtf f , # _ markets inward, Bionomics Conference, San Francisco, 

umcast trees, by not adding branches of the trees carrying „ lockA 

the less important bit streams to the lower bandwidth Cam * October iyy4j. 

networks. The network in this case takes care of bandwidth For the particular bandwidth required, a table is indexed 

mismatches by not forwarding packets to the networks & find tne °est possible combination of spatial resolution, 

which are not subscribed to a particular tree. Switches and frame rate and data rate (number of bits of look-up to be 

routers can still react to temporary congestion by intelli- used ) t0 gi ve the bcst quality of decompressed video, 

gently dropping packets from the embedded stream to Preferably such table is built using a subjective distortion 

deliver fewer bits of look-up. measure, such as described by N. Chaddha and T. H. Y. 

The delivery system treats' the audio track as a separate Meng "Psycho-visual based distortion measures for image 

stream that is stored on disk 100 and transmitted across the and vldeo compression , Proc. of Asilomar Conference on 

network as a separate entity. The audio format supports 30 Signals Systems and Computers, November 1993. 

multiple data formats from 8 KHz telephony quality (8 bit Preferably, the user also has the option of specifying the 

mu-law) to 48 KHz stereo quality audio (2 channel, 16 bit s P atial resolution, frame rate and bandwidth directly, 

linear samples). In practice, many video clips may have 8 It will be appreciated that the described overall system 

KHz telephone audio, to permit material distribution over combines a software-based encoder with an encoding com- 

medium-to-low bandwidth networks. The server can store 35 pression algorithm, disk management, network transport, 

separate high and low quality audio tracks, and transmit the software-based decoder, and synchronization mechanism to 

audio track selected by the user. As the audio transits the provide an end-to-end scalable video delivery service. The 

network on a separate circuit, the audio can easily be given service may be divided into three groups of components, 

a higher QOS than the video streams. Rather than further comprising preprocessing, media server, and media player, 

load the networks with duplicate audio packets, as is known ^ The processing components include audio capture, video 

in the prior art, in the present invention the audio is ramped capture, video compression, and a data stripping tool. The 

down to silence when packets are overly delayed or lost. video is captured and digitized using single step VCR 

As the audio and video are delivered via independent devices. Each frame is then compressed off-line (non-real 

mechanisms to the decoding system, the two streams must time) using the encoding algorithm. At present, it takes 

be synchronized by mechanism 145 for final presentation to 45 about one second on a SparcStation 20 Workstation to 

the user. At the decoder, the receiving threads communicate compress a frame of video data, and single step VCR devices 

through the use of a shared memory region, into which the can step at a one frame per second rate permitting overlap of 

sequence information of the current audio and video display capture and compression. 

units are written. The audio data preferably is captured as a single pass over 

The human perceptual system is more sensitive to audio 50 the tape. The audio and video time stamps and sequence 

dropouts than to video drops, and audio is more difficult than numbers are aligned by the data striping tool as the video is 

video to temporarily reprocess. Thus, the decoder preferably stored to facilitate later media synchronization. The audio 

uses the audio coder as the master clock for synchronization and video data preferably are striped onto the disks with a 

purposes. As the streams progress, the decoder threads post user-selected stripe size. In a preferred embodiment, all of 

the current data items' sequence information onto a "black- 55 the video data on the server uses a 48 kilobyte stripe size, as 

board" or scratchpad portion of memory associated with 48 kilobytes per disk transfer provides good utilization at 

CPU unit 140. The slave threads (such as the video decoder) peak load with approximately 50% of the disk bandwidth 

use the posted sequence information of the audio stream to delivering data to the media server components, 

determine when their data element should be displayed. The The media server components include a session control 

slave threads then delay until the appropriate time if the eo agent, the audio transmission agent, and the video transmis- 

slave is early (e.g., more than 80 milliseconds ahead of the sion agent. The user connects to the session control agent on 

audio). If the slave data is too late (e.g., more than 20 the server system and arranges to pay for the video service 

milliseconds behind the audio), then it is discarded on the and network bandwidth. The user can specify the cost he/she 

assumption that continuing to process late data will delay is willing to pay and an appropriately scaled stream will be 

more timely data. 65 provided by the server. The session control agent (e.g., 

Hie video decoder can optionally measure the deviation admission control mechanism 110) then sets up the network 

from the desired data delay rate and send speed-up and delivery connections and starts the video and audio trans- 
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mission agents. The session control agent 110 is the single 
point of entry for control operations from the consumers 
remote control, the network management system, and the 
electronic market. 

The audio and video transmission agents read the media 
data from the striped disks and pace the transmission of the 
data onto the network. The video transmission agent scales 
the embedded bit-stream in real-time by transmitting only 
the bit planes needed to reconstruct the selected resolution at 
the decoder. For example, a 320x240 stream with 8 bits of 
base, 4 bits of enhancement signal at 15 frames per second 
will transmit every other frame of video data with all 5 
packets for each frame of the base and only two packets 
containing the four most significant bits of the enhancement 
layer resulting in 864 Kb of network utilization. The server 
sends the video and audio either for a point-to-point situa- 
tion or a multicast situation. 

The media player components are the software-based or 
ROM-based video decoder 40, 40', the audio receiver, and a 
useT interface agent. The decoder receives the data from the 
network and decodes it using look-up tables and places the 
results onto the frame buffer. The decoder can run on any 
modern microprocessor unit without the CPU loading sig- 
nificantly. The audio receiver loops reading data from the 
network and queuues the data for output to the speaker. In 
the event of audio packet loss, the audio receiver preferably 
ramps the audio level down to silence level and then back up 
to the nominal audio level of the next successfully received 
audio packet. The system performs media synchronization 
to align the audio and video streams at the destination, using 
techniques such as described by J. D. Northcutt and E. M. 
Kuerner, "System Support for Time -Critical applications," 
Proc. NOSSDAV 91, Germany, pp 242-254. 

End-to-end feedback is used in the on demand case to 
control the flow. In the multicast case, the destinations are 
slaved to the flow from the server with no feedback. The user 
interface agent serves as the control connection to the 
session agent on the media server passing flow control 
feedback as well as the user's start/stop controls. The user 
can specify the cost he or she is willing to pay and an 
appropriate stream will be provided by the system. 

A prototype system according to the present invention 
uses a video data rate that varies from 19,2 kbps to 2 Mbps 
depending on the spatial and temporal requirement of the 
decoder and the network bandwidth available. The PSNR 
varies between 31.63 dB to 37.5 dB. Table 1 gives the results 
for the decoding of a 160x120 resolution video on a Sparc- 
Station 20. It can be seen from Table 1 that the time required 
to get the highest quality stream (8-bit index) at 160x120 
resolution is 2.45 ms per frame (sum of lookup and packing 
time). This corresponds to a potential frame rate of 400 
frames/sec. 

TABLE 1 

RESULTS FOR 160x120 RESOLUTION (DECODER') 
Bandwidth as a 



No. of 




function of 






Bits of 




fame rate (N) 


CPU time per 


Pa eking time 


Lookup 


PSNR (dB) 


Kbps 


frame (ms) 


per frame (ms) 


4 


31.63 dB 


19.2N 


1.24 ms 


0 ms 


5 


32.50 dB 


24N 


1.32 ms 


0.52 mB 


6 


34 dB 


28.8N 


1,26 ms 


0.80 ms 


7 


35.8 dB 


33.6N 


1.10 ms 


1.09 ms 


8 


37.2 dB 


38.4N 


1.18 ms 


1.27 ms 



45 



14 



Similarly, Table 2 gives the results for the decoding of a 
320x240 resolution video on a Sparcstation 20. It can be 
seen from Table 2 that the time required to get the highest 
quality stream (8-bit base index and 8-bit first enhancement 
layer index) at 320x240 resolution is 7.76 ms per frame 
(sum of look-up and packing time). This corresponds to a 
potential frame rate of 130 frames/sec. 

TABLE 2 



10 




RESULTS FOR 320x240 RESOLUTION 










(8 BIT-LOOKUP BASE 1 ) 










Bandwidth as a 








No. of 




function of 






15 


Bits of 




frame rate (N) 


CPU time per 


Packing time 


Lookup 


PSNR (dB) 


Kbps 


. . frame (ms) 


per frame (ms) 




2 


33.72 dB 


48N 


6.01 ms 


0.385 ms 




4 


35.0 dB 


52.8N 


6.04 ms 


0.645 ms 




5 


35.65 dB 


62.4N 


6.05 ms 


0.92 ms 




6 


36.26 dB 


67.2N 


6.08 ms 


1.20 ms 


20 


7 


36.9 dB 


72N 


6.04 ms 


1.48 ms 


8 


37.5 dB 


76.8N 


6.09 ms 


1.67 ms 



Table 3 gives the results for the decoding of a 640x480 
resolution video again on a SparcStation 20. It can be seen 
from Table 3 that the time required to get the highest quality 
stream (8 -bit base and 8 -bit enhancement layer) at 640x480 
resolution is 24.62 ms per frame (sum of lookup and packing 
time). This corresponds to a potential frame rate of 40 
frames/sec. 



30 



35 



TABLE 3 



RESULTS FOR 640x480 WITH 320X240 INTERPOLATED 



Bandwidth as a 



No. of 




function of 






Bits of 




frame rate (N) 


CPU time per 


Packing time 


Lookup 


PSNR (dB) 


Kbps 


frame (ms) 


per fame (ms) 


2 


33.2 dB 


48N 


22.8 ms 


0.385 ms 


4 


34 dB 


52.8N 


22.87 ms 


0.645 ms 


5 


34.34 dB 


62.4N 


23.14 ms 


0.92 ms 


6 


34.71 dB 


67.2N 


22.93 ms 


1.20 ms 


7 


35.07 dB 


72N 


2290 ms 


1.48 ms 


8 


35.34 dB 


76.8N 


22.95 ms 


1.67 ms 



Table 4 shows the results for each individual disk for 
160x120 resolution video. It can be seen that to get the 
highest quality stream (8-bit base) at 160x120 requires 5.60 
ms of CPU time and an average CPU load of 2% on a 
SparcStation 20 workstation. The average disk access time 
per frame is 16 ms. 



TABLE 4 



RESULTS FOR 160x120 AT THE DISK SERVER 



55 



Bandwidth as 
a function of 



No, of Bits 
of Lookup 


frame rate 
(N) Kbps 


CPU time per 
frame (ms) 


Seek-time Avg. CPU 
(ms) Load 


4 


19.2N 


284 ms 


16 ms 


1% 


5 


24N 


3.67 ms 


16 ms 


1% 


6 


28.8N 


4.48 ms 


14 ms 


2% 


7 


33.6N 


4.92 ms 


14 ms 


2% 


8 


38.4N 


5.60 ms 


16 ms 


2% 



Similarly, Table 5 shows the results for each individual disk 
65 for 320x240 resolution video. It can be seen that obtaining 
the highest quality stream (8-bit base and 8-bit enhancement 
layer) at 320x240 requires 12.73 ms of CPU time and an 
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average CPU load of 1% on a SparcStation workstation. The 
average disk access time per frame is 18 ms. 

TABLE 5 

5 

RESULTS FOR 320x240 AT THE DISK SERVER 

Bandwidth as 
a function of 



No. of Bits of 
Lookup 


frame rate 
(N) Kbps 


CPU time per 
frame (ms) 


Se&k-time Avg. CPU 
(ms) Load 


2 


48N 


10.47 ms 


18 ms 


6% 


4 


52.8N 


11.02 ms 


16 ms 


6% 


5 


62.4N 


11.55 ins 


18 ms 


6% 


6 


67.2N 


12.29 ms 


20 ms 


7% 


7 


72N 


1255 ms 


20 ms 


7% 


8 


76.8N 


1273 ms 


18 ms 


1% 
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Modifications and variations may be made to the dis- 
closed embodiments without departing from the subject and 
spirit of the invention as defined by the following claims. 

What is claimed is: 

1. A decoder that receives a network-transmittable server 
generated single embedded bit stream in pixel blocks that 
contains information for at least two spatial resolutions of 
video, said server encoding said embedded bit stream using ^ 
discrete cosine transformation followed by tree-structured 
quantization, said decoder outputting from said bit stream 
scaleable video that is variable in at least one of spatial 
resolution, temporal resolution, and data rate, the decoder 
including: 30 

a central processor unit coupled to a memory unit; 

a look-up table, stored in said memory unit, including 
preprocessed decoded versions of quantized indexable 
representations of inverse discrete cosine transforms of 
image data used when codewords present in said 35 
embedded bit stream were created; and 

a processor controlled by said central processor unit that 
processes said information contained in said embedded 
bit stream and decodes at least a first spatial resolution 
image by decompressing base layer data contained 40 
within said embedded bit stream; 

wherein said decoder can decode a viewable image from 
data in said embedded bit stream from received code- 
words containing less than a number of bits represent- 
ing a full-length codeword. 45 

2. The decoder of claim 1, wherein said processor further 
decodes a second, higher, spatial resolution image by 
decompressing said base layer data to obtain a first inter- 
mediate image that is up-sampled to yield a first up-sampled 
image to which is added decompressed error data in a first 50 
enhancement layer contained in said embedded bit stream. 

3. The decoder of claim 2, wherein said embedded bit 
stream contains at least three spatial resolutions, and 
wherein said processor further decodes a third image whose 
spatial resolution is higher than said second image by 55 
up-sampling said first up-sampled image to yield a second 
intermediate image to which is added decompressed error 
data in a second enhancement layer contained in said 
embedded bit stream. 

4. The decoder of claim 2, wherein said memory unit 60 
stores at least one of the following (i) an algorithm used to 
carry-out said decoding, (ii) look-up data used in decom- 
pressing said base layer data, (iii) look-up data associated 
with said error data in said first enhancement layer, (iv) 
preprocessed data representing up -sampling of said first 65 
up -sampled image, and (v) preprocessed data associated 
with said error data in said second enhancement layer. 
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5. The decoder of claim 2, wherein said tree -structured 
vector quantization has a tree depth R and a vector dimen- 
sion k; 

wherein bitstream bit rates O/k . . . , R/k are provided for 
said embedded bit stream. 

6. The decoder of claim 5, wherein said memory unit 
stores codewords arranged in a tree structure, 

7. The decoder of claim 1, wherein said server encodes 
spatial resolution data in said embedded bit stream in pixel 
blocks, wherein: 

said decoder provides error correction for said embedded 
bit stream using inverse vector quantization followed 
by inverse discrete cosine transformation of at least 
some of said pixel blocks, said inverse vector quanti- 
zation and said inverse discrete cosine transformation 
being preprocessed and stored in said memory unit. 

8. The decoder of claim 1, wherein said vector quantiza- 
tion includes human perception modelling. 

9. The decoder of claim 1, wherein said memory unit 
stores codewords arranged in a tree structure. 

10. The decoder of claim 2, wherein: 

said first image has 160x120 pixel resolution, said second 
image has 320x240 pixel resolution, and wherein said 
third image has 640x480 pixel resolution. 

11. The decoder of claim 2, wherein: 

said processor decodes a 160x120 pixel image by decom- 
pressing base layer data contained within said embed- 
ded bit stream; 

said processor decodes a 320x240 pixel image by decom- 
pressing said base layer data to obtain a first interme- 
diate image and up-samples said first intermediate 
image to yield a first up -sampled image to which is 
added decompressed error data in a first enhancement 
layer contained in said embedded bit stream; and 

said processor decodes a 640x480 pixel image by 
up-sampling said first up-sampled image to yield a 
second intermediate image to which is added decom- 
pressed error data in a second enhancement layer 
contained in said embedded bit stream. 

12. A hardware-based decoder for use with a video 
delivery system whose server encodes an embedded bit 
stream in pixel blocks using discrete cosine transformation 
followed by tree -structured quantization, said embedded bit 
stream including information for at least two spatial reso- 
lutions and transmittable over at least one network, the 
decoder including: 

a central processor unit coupled to a memory unit includ- 
ing a read-only memory storing preprocessed decoded 
versions of quantized indexable representations of 
inverse discrete cosine transforms of image data used 
by said server to create codewords present in said 
embedded bit stream; 

said preprocessed decoded versions including at least 
information used to decode a first spatial resolution 
image by decompressing base layer data contained 
within said embedded bit stream; 

wherein said decoder can decode a viewable image from 
data in said embedded bit stream from received code- 
words containing less than a number of bits represent- 
ing a full-length codeword. 

13. The decoder of claim 1 wherein said preprocessed 
decoded versions further include information used to decode 
a second, higher, spatial resolution image by decompressing 
said base layer data to obtain a first intermediate image that 
is up-sampled to yield a first up-sampled image to which is 
added decompressed error data in a first enhancement layer 
contained in said embedded bit stream. 
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14. The decoder of claim 12, wherein said embedded bit 
stream contains at least three spatial resolutions, and 
wherein said preprocessed decoded versions further include 
information used to decode a third spatial resolution image, 
whose spatial resolution is higher than said second image, by 5 
up -sampling said first up-sampled image to yield a second 
intermediate image to which is added decompressed error 
data in a second enhancement layer contained in said 
embedded bit stream, 

15. The decoder of claim 14, wherein said memory unit 10 
stores at least one of the following (i) an algorithm used to 
carry-out said decoding, (ii) look-up data used in decom- 
pressing said base layer data, (iii) look-up data associated 
with said error data in said first enhancement layer, (iv) 
preprocessed data representing up-sampling of said first 15 
up -sampled image, and (v) preprocessed data associated 
with said error data in said second enhancement layer. 

16. The decoder of claim 12, wherein said tree-structured 
vector quantization has a tree depth R and a vector dimen- 
sion k; 20 

wherein bitstream bit rates O/k . . . , R/k are provided for 
said embedded bit stream. 

17. The decoder of claim 12, wherein said read-only 
memory stores codewords arranged in a tree structure. 

18. The decoder of claim 12, wherein said vector quan- 25 
tization includes human perception modelling. 

19. The decoder of claim 12, wherein: 

said server encodes spatial resolution data in said embed- 
ded bit stream in pixel blocks; and wherein: 

30 

said decoder provides error correction for said embedded 
bit stream using inverse vector quantization followed 
by inverse discrete cosine transformation of at least 
some of said pixel blocks, said inverse vector quanti- 
zation and said inverse discrete cosine transformation 35 
being preprocessed and stored in said memory unit. 

20. A method of decoding information embedded by a 
video server in an embedded bit stream having pixel blocks 
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that include information for at least two spatial resolutions, 
said embedded bit stream being encoded by said server 
using discrete cosine transformation followed by tree- 
structured quantization, the method including the following 
steps: 

(a) providing a stored set of preprocessed decoded ver- 
sions of quantized indexable representations of inverse 
discrete cosine transforms of image data used by said 
server in creating codewords present in said embedded 
bit stream; 

(b) processing said embedded bit stream to use informa- 
tion contained therein to index into said stored set of 
preprocessed decoded versions to decode at least one 
of: 

(i) a first spatial resolution image by decompressing 
base layer data contained within said embedded bit 
stream; 

(ii) a second, higher, spatial resolution image by 
decompressing said base layer data to obtain a first 
intermediate image and up -sampling said first inter- 
mediate image to yield a first up-sampled image to 
which is added decompressed error data in a first 
enhancement layer contained in said embedded bit 
stream; and 

(iii) if said bit stream contains at least three spatial 
resolutions, a third image whose spatial resolution is 
higher than said second image by up-sam-pling said 
first up-sampled image to yield a second intermedi- 
ate image to which is added decompressed error data 
in a second enhancement layer contained in said 
embedded bit stream; 

wherein said a viewable image is decodable by said 
decoder from codewords received in said embedded 
bit stream that contain less than a number of bits 
representing a full-length codeword. 

***** 
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